Assignment 3: Exploring Evictions and Code Violations in Philadelphia

Due date: start of class Wednesday, 2/27

In this assignment, we'll explore spatial trends evictions in Philadelphia using data from the Eviction Lab and building code violations using data from OpenDataPhilly.

We'll be exploring the idea that evictions can occur as retaliation against renters for reporting code violations. Spatial correlations between evictions and code violations from the City's Licenses and Inspections department can offer some insight into this question.

A couple of interesting background readings:

1. Explore Eviction Lab Data

The Eviction Lab built the first national database for evictions. If you aren't familiar with the project, you can explore their website: https://evictionlab.org/

1.1 Read data using geopandas

The first step is to read the eviction data by census tract using geopandas. The data for all of Pennsylvania by census tract can be downloaded in a GeoJSON format using the following url:

https://eviction-lab-data-downloads.s3.amazonaws.com/PA/tracts.geojson

A browser-friendly version of the data is available here: https://data-downloads.evictionlab.org/

In [1]:
# import dependencies
import pandas as pd
import geopandas as gpd
import hvplot.pandas
import cartopy.crs as ccrs
In [2]:
# read the downloaded geojson file
tracts = gpd.read_file("tracts.geojson")
In [3]:
tracts.head()
Out[3]:
GEOID west south east north n pl p-00 pr-00 roh-00 ... pm-16 po-16 ef-16 e-16 er-16 efr-16 lf-16 imputed-16 subbed-16 geometry
0 42003412002 -80.1243 40.5422 -80.0640 40.5890 4120.02 Allegheny County, Pennsylvania 4748.59 0.88 58.0 ... 0.00 0.0 0.0 0.0 0.00 0.00 1.0 0.0 1.0 (POLYGON ((-80.06670099999999 40.584012, -80.0...
1 42003413100 -80.0681 40.5850 -79.9906 40.6143 4131 Allegheny County, Pennsylvania 6771.01 3.47 729.0 ... 1.59 0.0 12.0 2.0 0.27 1.62 1.0 0.0 1.0 (POLYGON ((-80.068057 40.612536, -80.054520999...
2 42003413300 -80.0657 40.5527 -80.0210 40.5721 4133 Allegheny County, Pennsylvania 5044.59 2.99 119.0 ... 0.95 0.0 4.0 1.0 0.49 1.96 1.0 0.0 1.0 (POLYGON ((-80.03821600000001 40.553495, -80.0...
3 42003416000 -79.8113 40.5440 -79.7637 40.5630 4160 Allegheny County, Pennsylvania 1775.93 4.99 121.0 ... 0.55 0.0 1.0 1.0 0.65 0.65 1.0 0.0 1.0 (POLYGON ((-79.765946 40.550915, -79.765415 40...
4 42003417200 -79.7948 40.5341 -79.7642 40.5443 4172 Allegheny County, Pennsylvania 1428.03 11.95 321.0 ... 0.00 0.0 7.0 3.0 0.82 1.90 1.0 0.0 1.0 (POLYGON ((-79.771137 40.544153, -79.764172 40...

5 rows × 399 columns

1.2 Explore and trim the data

We will need to trim data to Philadelphia only. Take a look at the data dictionary for the descriptions of the various columns: https://eviction-lab-data-downloads.s3.amazonaws.com/DATA_DICTIONARY.txt

Note: the column names are shortened — see the end of the above file for the abbreviations. The numbers at the end of the columns indicate the years. For example, e-16 is the number of evictions in 2016.

Take a look at the individual columns and trim to census tracts in Philadelphia. (Hint: Philadelphia is both a city and a county).

In [4]:
philly = tracts.loc[tracts['pl'] == 'Philadelphia County, Pennsylvania']
philly.head()
Out[4]:
GEOID west south east north n pl p-00 pr-00 roh-00 ... pm-16 po-16 ef-16 e-16 er-16 efr-16 lf-16 imputed-16 subbed-16 geometry
435 42101000100 -75.1523 39.9481 -75.1415 39.9569 1 Philadelphia County, Pennsylvania 2646.71 9.26 1347.0 ... 2.49 0.00 25.0 16.0 0.93 1.45 0.0 0.0 1.0 (POLYGON ((-75.14160699999999 39.955491, -75.1...
436 42101000200 -75.1631 39.9529 -75.1511 39.9578 2 Philadelphia County, Pennsylvania 1362.00 56.42 374.0 ... 2.27 0.00 11.0 8.0 0.95 1.30 0.0 0.0 1.0 (POLYGON ((-75.151223 39.956862, -75.151669 39...
437 42101000300 -75.1798 39.9544 -75.1623 39.9599 3 Philadelphia County, Pennsylvania 2570.00 12.16 861.0 ... 1.76 0.00 26.0 14.0 0.73 1.35 0.0 0.0 1.0 (POLYGON ((-75.162339 39.957825, -75.162374 39...
438 42101000801 -75.1833 39.9486 -75.1773 39.9515 8.01 Philadelphia County, Pennsylvania 1478.00 14.40 810.0 ... 1.42 3.78 13.0 4.0 0.51 1.64 0.0 0.0 1.0 (POLYGON ((-75.177323 39.950964, -75.177843 39...
439 42101000804 -75.1712 39.9470 -75.1643 39.9501 8.04 Philadelphia County, Pennsylvania 3301.00 14.40 2058.0 ... 0.19 0.35 22.0 7.0 0.33 1.04 0.0 0.0 1.0 (POLYGON ((-75.17118000000001 39.947784, -75.1...

5 rows × 399 columns

1.3 Transform from wide to tidy format

For this assignment, we are interested in the number of evictions by census tract for various years. Right now, each year has it's own column, so it will be easiest to transform to a tidy format.

Use the pd.melt() function to transform the eviction data into tidy format, using the number of evictions from 2003 to 2016.

The tidy data frame should have four columns: GEOID, geometry, a column holding the number of evictions, and a column telling you what the name of the original column was for that value.

Hints:

  • You'll want to specify the GEOID and geometry columns as the id_vars. This will keep track of the census tract information.
  • You should specify the names of the columns holding the number of evictions as the value_vars.
  • You can generate a list of this column names using [Python's string formatting]:(https://docs.python.org/3.7/library/string.html#format-examples)
    value_vars = ['e-{:02d}'.format(x) for x in range(3, 17)]
    
In [5]:
e_years = ['e-{:02d}'.format(x) for x in range(3, 17)]
print(e_years)
['e-03', 'e-04', 'e-05', 'e-06', 'e-07', 'e-08', 'e-09', 'e-10', 'e-11', 'e-12', 'e-13', 'e-14', 'e-15', 'e-16']
In [6]:
evictions = pd.melt(philly, id_vars=['GEOID', 'geometry'], value_vars=e_years, var_name='year', value_name='evictions_count')
evictions.head()
Out[6]:
GEOID geometry year evictions_count
0 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... e-03 21.0
1 42101000200 (POLYGON ((-75.151223 39.956862, -75.151669 39... e-03 3.0
2 42101000300 (POLYGON ((-75.162339 39.957825, -75.162374 39... e-03 17.0
3 42101000801 (POLYGON ((-75.177323 39.950964, -75.177843 39... e-03 13.0
4 42101000804 (POLYGON ((-75.17118000000001 39.947784, -75.1... e-03 21.0
In [7]:
# change the format of year
evictions['year'] = evictions['year'].apply(lambda x: '20{}'.format(x[-2:]))
evictions.head()
Out[7]:
GEOID geometry year evictions_count
0 42101000100 (POLYGON ((-75.14160699999999 39.955491, -75.1... 2003 21.0
1 42101000200 (POLYGON ((-75.151223 39.956862, -75.151669 39... 2003 3.0
2 42101000300 (POLYGON ((-75.162339 39.957825, -75.162374 39... 2003 17.0
3 42101000801 (POLYGON ((-75.177323 39.950964, -75.177843 39... 2003 13.0
4 42101000804 (POLYGON ((-75.17118000000001 39.947784, -75.1... 2003 21.0

1.4 Plot the total number of evictions per year from 2003 to 2016

Use hvplot to plot the total number of evictions from 2003 to 2016. You will first need to perform a group by operation and sum up the total number of evictions for all census tracts, and then use hvplot() to make your plot.

You can use any type of hvplot chart you'd like to show the trend in number of evictions over time.

In [8]:
total = evictions.groupby(['year'])['evictions_count'].sum()
total.head()
Out[8]:
year
2003    10647.0
2004    10491.0
2005    10550.0
2006    11078.0
2007    11032.0
Name: evictions_count, dtype: float64
In [9]:
total.hvplot(kind='line')
Out[9]:

1.5 The number of evictions across Philadelphia

Our tidy data frame is still a GeoDataFrame with a geometry column, so we can visualize the number of evictions for all census tracts.

Use hvplot() to generate a choropleth showing the number of evictions for a specified year, with a widget dropdown to select a given year (or variable name, e.g., e-16, e-15, etc).

Hints

  • You'll need to use the groupby keyword to tell hvplot to make a series of maps, with a widget to select between them.
  • You will need to specify dynamic=False as a keyword argument to the hvplot() function.
  • Be sure to specify a width and height that makes your output map (roughly) square to limit distortions
In [10]:
evictions.hvplot(c='evictions_count', groupby='year', crs=3857, width=500, height=430, dynamic=False)
Out[10]:

2. Code Violations in Philadelphia

Next, we'll explore data for code violations from the Licenses and Inspections Department of Philadelphia to look for potential correlations with the number of evictions.

We'll be pulling data directly from the CARTO database for the L&I Violations dataset. API information and metadata is available here:

2.1 Count the number of rows in the database

First, use the carto package to query the database and count the total number of rows.

In [11]:
from pyrestcli.auth import NoAuthClient
from carto.sql import SQLClient

API_endpoint = "https://phl.carto.com"
sql_client = SQLClient(NoAuthClient(API_endpoint))
In [12]:
query = "SELECT COUNT(*) FROM li_violations"
counts = sql_client.send(query)
counts
C:\Users\dell\Anaconda3\envs\musa\lib\site-packages\pyrestcli\auth.py:79: UserWarning: You are using methods other than get with no authentication!!!
  warnings.warn(_("You are using methods other than get with no authentication!!!"))
Out[12]:
{'rows': [{'count': 1328549}],
 'time': 0.091,
 'fields': {'count': {'type': 'number'}},
 'total_rows': 1}

2.2 Get a single row of data for column inspection

Query the database API and limit your results to a single row. Create a GeoDataFrame from the query results and inspect the columns to identify the column that gives you the date of each code violation.

In [13]:
# select the first item
query = "SELECT * FROM li_violations LIMIT 1"
onerow = sql_client.send(query, format='geojson')
onerow
Out[13]:
{'type': 'FeatureCollection',
 'features': [{'type': 'Feature',
   'geometry': {'type': 'Point', 'coordinates': [-75.057887, 40.026692]},
   'properties': {'cartodb_id': 1,
    'objectid': 1,
    'addresskey': '556125',
    'opa_account_num': '552054300',
    'address': '4137 STIRLING ST',
    'unit': '',
    'zip': '19135-3025',
    'censustract': '320',
    'council_district': '6',
    'ownername': 'MURARESKU SHARON M',
    'organization': '',
    'casenumber': '163026',
    'aptype': 'CD ENFORCE',
    'caseaddeddate': '2008-06-02T17:43:03Z',
    'caseresolutiondate': None,
    'caseresolutioncode': 'RES',
    'apfailkey': 969093,
    'violationdate': '2008-06-02T00:00:00Z',
    'violationtype': 'CP-01',
    'violationdescription': 'CLIP VIOLATION NOTICE',
    'mostrecentinsp': '2008-06-17T07:34:00Z',
    'status': 'COMPLIED',
    'casestatus': 'CLOSED',
    'casegroup': 'CLIP',
    'casepriority': 'NH',
    'prioritydesc': 'NON HAZARDOUS',
    'geocode_x': 2722309.85706634,
    'geocode_y': 264071.53944553}}]}
In [14]:
# convert to geodataframe and check the columns
one = gpd.GeoDataFrame.from_features(onerow, crs={'init': 'epsg:4326'})
one.head()
Out[14]:
address addresskey apfailkey aptype cartodb_id caseaddeddate casegroup casenumber casepriority caseresolutioncode ... opa_account_num organization ownername prioritydesc status unit violationdate violationdescription violationtype zip
0 4137 STIRLING ST 556125 969093 CD ENFORCE 1 2008-06-02T17:43:03Z CLIP 163026 NH RES ... 552054300 MURARESKU SHARON M NON HAZARDOUS COMPLIED 2008-06-02T00:00:00Z CLIP VIOLATION NOTICE CP-01 19135-3025

1 rows × 29 columns

2.3 Get the data from 2012 to 2016

Using the name of the column identified in the previous section, query the database to get all data for years including 2012 through 2016 (inclusive), for 5 years worth of data. Create a GeoDataFrame from the query results.

Notes

  • The query operation might take a minute or two to execute.
  • The resulting GeoDataFrame should have between 400,000 and 500,000 rows.
In [15]:
query = "SELECT * FROM li_violations WHERE violationdate >= '2012-01-01T00:00:00Z' AND violationdate <= '2016-12-31T23:59:59Z'"
response = sql_client.send(query, format='geojson')
violations = gpd.GeoDataFrame.from_features(response, crs={'init': 'epsg:4326'})
violations.head()
Out[15]:
address addresskey apfailkey aptype cartodb_id caseaddeddate casegroup casenumber casepriority caseresolutioncode ... opa_account_num organization ownername prioritydesc status unit violationdate violationdescription violationtype zip
0 311 STATE ST 552337 2959460 CD ENFORCE 5 2013-10-04T13:01:02Z DW 405622 CONS CMPLY ... 241240900 ADORNO ALEXANDER C ADORNO MEGHAN CONSTRUCTION SERVICES COMPLIED 2013-10-03T00:00:00Z PERMB- ALTER REP'R EXT PORTION A-301.1/3 19104-0000
1 2022 E SUSQUEHANNA AVE 558682 2352716 PP_PLUMBNG 9 None 383275 ... 313027101 BRIX STACY L MCINTOSH JOEL T ERROR 2012-02-03T00:00:00Z UNAPPROVED MAT'L ETC A-203.1/1 19125-1526
2 2028 E SUSQUEHANNA AVE 558688 3000924 CD ENFORCE 10 2013-07-31T11:24:50Z DN 394623 CONS CMPLY ... 313027400 AGHASIYA ANJAN M VAGHASIYA MEGHANA A CONSTRUCTION SERVICES COMPLIED 2013-07-30T00:00:00Z PERM - MUST POST A-302.8/30 19125-1526
3 2038 E SUSQUEHANNA AVE 558695 4163350 CD ENFORCE 11 2016-10-24T10:16:49Z DE 560782 CONS CMPLY ... 313027900 ROBINSON PAUL NICHOLAS CONSTRUCTION SERVICES COMPLIED 2016-10-24T00:00:00Z PERMB- CONST ERECT NEW STRUCT A-301.1/1 19125-1542
4 5426 TACKAWANNA ST 563361 3362960 CD ENFORCE 19 2014-10-17T16:00:36Z CLIP 457241 NH CMPLY ... 622306100 PROIOS JOHN NON HAZARDOUS COMPLIED 2014-10-17T00:00:00Z CLIP VIOLATION NOTICE CP-01 19124-1411

5 rows × 29 columns

2.4 Remove any violations with missing locations

Check if any of the violations have missing data, and if so, trim these rows from the dataset.

Hints

  • Take a look at the isnull() and notnull() functions.
  • They can be applied to the geometry column to test for missing geometries.
In [16]:
# remove missing geometry
violations = violations.loc[violations['geometry'].notnull()]
violations.head()
Out[16]:
address addresskey apfailkey aptype cartodb_id caseaddeddate casegroup casenumber casepriority caseresolutioncode ... opa_account_num organization ownername prioritydesc status unit violationdate violationdescription violationtype zip
0 311 STATE ST 552337 2959460 CD ENFORCE 5 2013-10-04T13:01:02Z DW 405622 CONS CMPLY ... 241240900 ADORNO ALEXANDER C ADORNO MEGHAN CONSTRUCTION SERVICES COMPLIED 2013-10-03T00:00:00Z PERMB- ALTER REP'R EXT PORTION A-301.1/3 19104-0000
1 2022 E SUSQUEHANNA AVE 558682 2352716 PP_PLUMBNG 9 None 383275 ... 313027101 BRIX STACY L MCINTOSH JOEL T ERROR 2012-02-03T00:00:00Z UNAPPROVED MAT'L ETC A-203.1/1 19125-1526
2 2028 E SUSQUEHANNA AVE 558688 3000924 CD ENFORCE 10 2013-07-31T11:24:50Z DN 394623 CONS CMPLY ... 313027400 AGHASIYA ANJAN M VAGHASIYA MEGHANA A CONSTRUCTION SERVICES COMPLIED 2013-07-30T00:00:00Z PERM - MUST POST A-302.8/30 19125-1526
3 2038 E SUSQUEHANNA AVE 558695 4163350 CD ENFORCE 11 2016-10-24T10:16:49Z DE 560782 CONS CMPLY ... 313027900 ROBINSON PAUL NICHOLAS CONSTRUCTION SERVICES COMPLIED 2016-10-24T00:00:00Z PERMB- CONST ERECT NEW STRUCT A-301.1/1 19125-1542
4 5426 TACKAWANNA ST 563361 3362960 CD ENFORCE 19 2014-10-17T16:00:36Z CLIP 457241 NH CMPLY ... 622306100 PROIOS JOHN NON HAZARDOUS COMPLIED 2014-10-17T00:00:00Z CLIP VIOLATION NOTICE CP-01 19124-1411

5 rows × 29 columns

2.5 Trim to specific violation types

There are many different types of code violations (running the nunique() function on the violationdescription column will extract all of the unique ones). More information on different types of violations can be found on the City's website.

Below, I've selected 15 types of violations that deal with property maintenance and licensing issues. We'll focus on these violations. The goal is to see if these kinds of violations are correlated spatially with the number of evictions in a given area.

Use the list of violations given to trim your data set to only include these types.

In [17]:
violation_types = ['INT-PLMBG MAINT FIXTURES-RES',
 'INT S-CEILING REPAIR/MAINT SAN',
 'PLUMBING SYSTEMS-GENERAL',
 'CO DETECTOR NEEDED',
 'INTERIOR SURFACES',
 'EXT S-ROOF REPAIR',
 'ELEC-RECEPTABLE DEFECTIVE-RES',
 'INT S-FLOOR REPAIR',
 'DRAINAGE-MAIN DRAIN REPAIR-RES',
 'DRAINAGE-DOWNSPOUT REPR/REPLC',
 'LIGHT FIXTURE DEFECTIVE-RES',
 'LICENSE-RES SFD/2FD',
 'ELECTRICAL -HAZARD',
 'VACANT PROPERTIES-GENERAL',
 'INT-PLMBG FIXTURES-RES']
In [18]:
vio_trim = violations[violations['violationdescription'].isin(violation_types)]
vio_trim.head()
Out[18]:
address addresskey apfailkey aptype cartodb_id caseaddeddate casegroup casenumber casepriority caseresolutioncode ... opa_account_num organization ownername prioritydesc status unit violationdate violationdescription violationtype zip
29 4748 TAMPA ST 564180 3372654 CD ENFORCE 81 2014-09-19T07:48:08Z HCEU 453270 NH CMPLY ... 421575200 LOSCALZO NICHOLAS SANDRA NON HAZARDOUS COMPLIED 2014-09-18T00:00:00Z INT-PLMBG FIXTURES-RES PM-405.3/1 19120-4622
34 1532 S TANEY ST 564722 3300023 CD ENFORCE 92 2014-07-17T12:32:33Z HCEU 443053 NH CMPLY ... 364285900 RINICK JOHN RINICK ANGELA NON HAZARDOUS COMPLIED 2014-07-16T00:00:00Z INT S-FLOOR REPAIR PM-305.3/2 19146-4427
38 3024 N SWANSON ST 559648 3062572 CD ENFORCE 100 2013-06-04T09:37:41Z HCEU 383993 NH ... 071352900 SIENKO PETER SIENKO MARYLOU NON HAZARDOUS 2013-06-03T00:00:00Z LICENSE-RES SFD/2FD PM-102.3/1 19134-2814
50 3418 TAMPA ST 563969 3791537 CD ENFORCE 131 2015-06-09T08:15:03Z HCEU 484462 NH ... 331263700 SIENKO PETER SIENKO MARYLOU NON HAZARDOUS 2015-06-08T00:00:00Z EXT S-ROOF REPAIR PM-304.4/1 19134-1230
53 1517 S TANEY ST 564707 3308759 CD ENFORCE 134 2014-01-28T14:43:38Z HCEU 417489 NH CMPLY ... 364279500 FERRY REAL ESTATE COMPANY NON HAZARDOUS COMPLIED 2014-01-28T00:00:00Z INT S-CEILING REPAIR/MAINT SAN PM-305.3/3 19146-4426

5 rows × 29 columns

2.6 Make a hex bin map

The code violation data is point data. We can get a quick look at the geographic distribution using matplotlib and the hexbin() function. Make a hex bin map of the code violations and overlay the census tract outlines.

Hints:

  • The eviction data from part 1 was by census tract, so the census tract geometries are available as part of that GeoDataFrame. You can use it to overlay the census tracts on your hex bin map.
  • Make sure you convert your GeoDataFrame to a CRS that's better for visualization than plain old 4326.
In [19]:
from matplotlib import pyplot as plt

# project and convert evictions.crs to 3857
evictions.crs = {'init': 'epsg:4326'}
evictions = evictions.to_crs({'init': 'epsg:3857'})
print(evictions.crs)
{'init': 'epsg:3857'}
In [20]:
# convert violations to 3857
vio_trim = vio_trim.to_crs({'init': 'epsg:3857'})
print(vio_trim.crs)
{'init': 'epsg:3857'}
In [21]:
# matplotlib
# create axes
crs = ccrs.epsg('3857')
ax = plt.axes(projection=crs)

# hexbin
hexplot = ax.hexbin(vio_trim.geometry.x, vio_trim.geometry.y, gridsize=50)

# add census geometry
ax.add_geometries(evictions.geometry, crs=crs, facecolor='none', edgecolor='white', linewidth=0.1, alpha=0.3)

# adjust figure size
ax.figure.set_size_inches((13,13))

# add legend
legend = plt.colorbar(hexplot)
legend.ax.tick_params(labelsize=10)

2.7 Spatially join data sets

To do a census tract comparison to our eviction data, we need to find which census tract each of the code violations falls into. Use the geopandas.sjoin() function to do just that.

Hints

  • You can re-use your eviction data frame, but you will only need the geometry column (specifying census tract polygons) and the GEOID column (specifying the name of each census tract).
  • Make sure both data frames have the same CRS before joining them together!
In [22]:
# select census tracts geometries from evictions
census = evictions.iloc[:,0:2]
census.head()
Out[22]:
GEOID geometry
0 42101000100 (POLYGON ((-8364725.428628281 4859476.45937238...
1 42101000200 (POLYGON ((-8365795.87685175 4859675.561630982...
2 42101000300 (POLYGON ((-8367033.304311408 4859815.41483808...
3 42101000801 (POLYGON ((-8368701.315561454 4858819.05816059...
4 42101000804 (POLYGON ((-8368017.479929511 4858357.29140979...
In [23]:
# spatial join
joined = gpd.sjoin(vio_trim, census, op='within', how='left')
joined.head()
Out[23]:
address addresskey apfailkey aptype cartodb_id caseaddeddate casegroup casenumber casepriority caseresolutioncode ... ownername prioritydesc status unit violationdate violationdescription violationtype zip index_right GEOID
29 4748 TAMPA ST 564180 3372654 CD ENFORCE 81 2014-09-19T07:48:08Z HCEU 453270 NH CMPLY ... LOSCALZO NICHOLAS SANDRA NON HAZARDOUS COMPLIED 2014-09-18T00:00:00Z INT-PLMBG FIXTURES-RES PM-405.3/1 19120-4622 83 42101028902
29 4748 TAMPA ST 564180 3372654 CD ENFORCE 81 2014-09-19T07:48:08Z HCEU 453270 NH CMPLY ... LOSCALZO NICHOLAS SANDRA NON HAZARDOUS COMPLIED 2014-09-18T00:00:00Z INT-PLMBG FIXTURES-RES PM-405.3/1 19120-4622 467 42101028902
29 4748 TAMPA ST 564180 3372654 CD ENFORCE 81 2014-09-19T07:48:08Z HCEU 453270 NH CMPLY ... LOSCALZO NICHOLAS SANDRA NON HAZARDOUS COMPLIED 2014-09-18T00:00:00Z INT-PLMBG FIXTURES-RES PM-405.3/1 19120-4622 851 42101028902
29 4748 TAMPA ST 564180 3372654 CD ENFORCE 81 2014-09-19T07:48:08Z HCEU 453270 NH CMPLY ... LOSCALZO NICHOLAS SANDRA NON HAZARDOUS COMPLIED 2014-09-18T00:00:00Z INT-PLMBG FIXTURES-RES PM-405.3/1 19120-4622 1235 42101028902
29 4748 TAMPA ST 564180 3372654 CD ENFORCE 81 2014-09-19T07:48:08Z HCEU 453270 NH CMPLY ... LOSCALZO NICHOLAS SANDRA NON HAZARDOUS COMPLIED 2014-09-18T00:00:00Z INT-PLMBG FIXTURES-RES PM-405.3/1 19120-4622 1619 42101028902

5 rows × 31 columns

2.8 Calculate the number of violations by type per census tract

Next, we'll want to find the number of violations (for each kind) per census tract. You should group the data frame by violation type and census tract name.

The result of this step should be a data frame with three columns: violationdescription, GEOID, and N, where N is the number of violations of that kind in the specified census tract.

Optional: to make prettier plots

Some census tracts won't have any violations, and they won't be included when we do the above calculation. However, there is a trick to set the values for those census tracts to be zero. After you calculate the sizes of each violation/census tract group, you can run:

N = N.unstack(fill_value=0).stack().reset_index(name='N')

where N gives the total size of each of the groups, specified by violation type and census tract name.

See this StackOverflow post for more details.

This part is optional, but will make the resulting maps a bit prettier.

In [24]:
n_violations = joined.groupby(['GEOID', 'violationdescription']).size().unstack(fill_value=0).stack().reset_index(name='violation_count')
n_violations.head()
Out[24]:
GEOID violationdescription violation_count
0 42101000100 CO DETECTOR NEEDED 0
1 42101000100 DRAINAGE-DOWNSPOUT REPR/REPLC 84
2 42101000100 DRAINAGE-MAIN DRAIN REPAIR-RES 0
3 42101000100 ELEC-RECEPTABLE DEFECTIVE-RES 0
4 42101000100 ELECTRICAL -HAZARD 28

2.9 Merge with census tracts geometries

We now have the number of violations of different types per census tract specified as a regular DataFrame. You can now merge it with the census tract geometries (from your eviction data GeoDataFrame) to create a GeoDataFrame.

Hints

  • Use pandas.merge() and specify the on keyword to be the column holding census tract names.
  • Make sure the result of the merge operation is a GeoDataFrame — you will want the GeoDataFrame holding census tract geometries to be the first argument of the pandas.merge() function.
In [25]:
# The eviction dataset contains duplicated geometries (from 2003 to 2016), so I remove the duplicated ones by selecting census tracts in 2016
census16 = evictions.loc[evictions['year'] == '2016']
census16 = census16.iloc[:,0:2]
print(len(census16)) # there are 384 census tracts in 2016
384
In [26]:
# merge tracts in 2016 with violations
vio_merged = census16.merge(n_violations, on='GEOID')
vio_merged.head()
Out[26]:
GEOID geometry violationdescription violation_count
0 42101000100 (POLYGON ((-8364725.428628281 4859476.45937238... CO DETECTOR NEEDED 0
1 42101000100 (POLYGON ((-8364725.428628281 4859476.45937238... DRAINAGE-DOWNSPOUT REPR/REPLC 84
2 42101000100 (POLYGON ((-8364725.428628281 4859476.45937238... DRAINAGE-MAIN DRAIN REPAIR-RES 0
3 42101000100 (POLYGON ((-8364725.428628281 4859476.45937238... ELEC-RECEPTABLE DEFECTIVE-RES 0
4 42101000100 (POLYGON ((-8364725.428628281 4859476.45937238... ELECTRICAL -HAZARD 28

2.10 Interactive choropleths for each violation type

Now, we can use hvplot() to create an interactive choropleth for each violation type and add a widget to specify different violation types.

Hints

  • You'll need to use the groupby keyword to tell hvplot to make a series of maps, with a widget to select different violation types.
  • You will need to specify dynamic=False as a keyword argument to the hvplot() function.
  • Be sure to specify a width and height that makes your output map (roughly) square to limit distortions
In [27]:
vio_merged.hvplot(c='violation_count', groupby='violationdescription', crs=3857, width=500, height=430, dynamic=False)
Out[27]:

3. A side-by-side comparison

From the interactive maps of evictions and violations, you should notice a lot of spatial overlap.

As a final step, we'll make a side-by-side comparison to better show the spatial correlations. This will involve a few steps:

  1. Trim the data frame plotted in section 1.5 to only include evictions from 2016.
  2. Trim the data frame plotted in section 2.10 to only include a single violation type (pick whichever one you want!).
  3. Use hvplot() to make two interactive choropleth maps, one for the data from step 1. and one for the data in step 2.
  4. Show these two plots side by side (one row and 2 columns) using the syntax for combining charts.

Note: since we selected a single year and violation type, you won't need to use the groupby= keyword here.

In [28]:
# select evictions in 2016
evic16 = evictions.loc[evictions['year'] == '2016']
evic16.head()
Out[28]:
GEOID geometry year evictions_count
4992 42101000100 (POLYGON ((-8364725.428628281 4859476.45937238... 2016 16.0
4993 42101000200 (POLYGON ((-8365795.87685175 4859675.561630982... 2016 8.0
4994 42101000300 (POLYGON ((-8367033.304311408 4859815.41483808... 2016 14.0
4995 42101000801 (POLYGON ((-8368701.315561454 4858819.05816059... 2016 4.0
4996 42101000804 (POLYGON ((-8368017.479929511 4858357.29140979... 2016 7.0
In [29]:
# select violation type INTERIOR SURFACES
vio_type = vio_merged.loc[vio_merged['violationdescription'] == 'INTERIOR SURFACES']
vio_type.head()
Out[29]:
GEOID geometry violationdescription violation_count
10 42101000100 (POLYGON ((-8364725.428628281 4859476.45937238... INTERIOR SURFACES 14
25 42101000200 (POLYGON ((-8365795.87685175 4859675.561630982... INTERIOR SURFACES 56
40 42101000300 (POLYGON ((-8367033.304311408 4859815.41483808... INTERIOR SURFACES 14
55 42101000801 (POLYGON ((-8368701.315561454 4858819.05816059... INTERIOR SURFACES 0
70 42101000804 (POLYGON ((-8368017.479929511 4858357.29140979... INTERIOR SURFACES 0
In [30]:
# plot side by side
evic_plot = evic16.hvplot(c='evictions_count', crs=3857, width=500, height=430, dynamic=False)
vio_plot = vio_type.hvplot(c='violation_count', crs=3857, width=500, height=430, dynamic=False)

(evic_plot + vio_plot).cols(2)
Out[30]:

4. Extra Credit

Identify the 20 most common types of violations within the time period of 2012 to 2016 and create a set of interactive choropleths similar to what was done in section 2.10.

Use this set of maps to identify 3 types of violations that don't seem to have much spatial overlap with the number of evictions in the City.

In [31]:
# count the number of violations by types and sort descendingly
vio_most = violations.groupby(['violationdescription']).size().reset_index(name='count')
vio_most = vio_most.sort_values(by=['count'], ascending=False)
vio_most.head()
Out[31]:
violationdescription count
73 CLIP VIOLATION NOTICE 85939
344 EXT A-VACANT LOT CLEAN/MAINTAI 42801
759 HIGH WEEDS-CUT 27656
1189 RUBBISH/GARBAGE EXTERIOR-OWNER 15701
911 LICENSE-VAC RES BLDG 15314
In [32]:
# get the 20 most common types
vio_top20 = vio_most.iloc[0:21]
vio_top20
Out[32]:
violationdescription count
73 CLIP VIOLATION NOTICE 85939
344 EXT A-VACANT LOT CLEAN/MAINTAI 42801
759 HIGH WEEDS-CUT 27656
1189 RUBBISH/GARBAGE EXTERIOR-OWNER 15701
911 LICENSE-VAC RES BLDG 15314
1328 VACANT PROP STANDARD 12348
334 EXT A-CLEAN RUBBISH/GARBAGE 8845
908 LICENSE-RES SFD/2FD 8191
335 EXT A-CLEAN WEEDS/PLANTS 7914
905 LICENSE-RES GENERAL 7711
1323 VACANT BLDG UNSECURED COUNT 6618
1334 VACANT STRUCTURE LICENSE 6290
898 LICENSE - RENTAL PROPERTY 6021
1371 VIOL C&I MESSAGE 5910
78 CO DETECTOR NEEDED 5795
20 ANNUAL CERT FIRE ALARM 5786
805 INT S-CEILING REPAIR/MAINT SAN 5151
1200 SD-REQD EXIST GROUP R 4839
985 PERM Z- NEW USE 4557
1316 VAC PROP REPLAC WIN/DRS 80% 4150
406 EXTERIOR AREA SANITATION 4059
In [33]:
# filter the violation dataset by 20 most common types
common_types = vio_top20['violationdescription']
common_types

vio_filtered = violations.loc[violations['violationdescription'].isin(common_types)]
vio_filtered.head()
Out[33]:
address addresskey apfailkey aptype cartodb_id caseaddeddate casegroup casenumber casepriority caseresolutioncode ... opa_account_num organization ownername prioritydesc status unit violationdate violationdescription violationtype zip
4 5426 TACKAWANNA ST 563361 3362960 CD ENFORCE 19 2014-10-17T16:00:36Z CLIP 457241 NH CMPLY ... 622306100 PROIOS JOHN NON HAZARDOUS COMPLIED 2014-10-17T00:00:00Z CLIP VIOLATION NOTICE CP-01 19124-1411
5 1614 W SUSQUEHANNA AVE 559103 3890499 CD ENFORCE 20 2015-12-22T08:54:23Z HCEU 514795 NH ... 871144250 NICER HOUSING LLC NON HAZARDOUS 2015-12-21T00:00:00Z VACANT STRUCTURE LICENSE 9-3905 19121-1620
9 2044 E SUSQUEHANNA AVE 558701 4020992 CD ENFORCE 37 2016-07-25T13:35:03Z CLIP 546120 NH CMPLY ... 313028200 FOZAN EHMEDI COUSINS SUSQUEHANNA LLC NON HAZARDOUS COMPLIED 2016-07-25T00:00:00Z EXT A-VACANT LOT CLEAN/MAINTAI PM-302.2/4 19125-1542
10 3149 STIRLING ST 556079 4108090 CD ENFORCE 39 2016-10-07T16:37:36Z CLIP 558849 NH ABATE ... 551007100 POULIN PAUL M NON HAZARDOUS COMPLIED 2016-10-07T00:00:00Z HIGH WEEDS-CUT CP-312A 19149-3117
11 4062 TEESDALE ST 566624 3687656 CD ENFORCE 41 2015-08-24T15:31:15Z CLIP 498908 NH ABATE ... 412086100 WANG WEIPING ZHU JIE NON HAZARDOUS COMPLIED 2015-08-24T00:00:00Z HIGH WEEDS-CUT CP-312A 19136-3919

5 rows × 29 columns

In [34]:
# convert crs to 3857
vio_filtered = vio_filtered.to_crs({'init': 'epsg:3857'})
print(vio_filtered.crs)
{'init': 'epsg:3857'}
In [35]:
# spatial join the filtered violations and census tracts
joined2 = gpd.sjoin(vio_filtered, census16, op='within', how='left')
joined2.head()
Out[35]:
address addresskey apfailkey aptype cartodb_id caseaddeddate casegroup casenumber casepriority caseresolutioncode ... ownername prioritydesc status unit violationdate violationdescription violationtype zip index_right GEOID
4 5426 TACKAWANNA ST 563361 3362960 CD ENFORCE 19 2014-10-17T16:00:36Z CLIP 457241 NH CMPLY ... PROIOS JOHN NON HAZARDOUS COMPLIED 2014-10-17T00:00:00Z CLIP VIOLATION NOTICE CP-01 19124-1411 5080 42101029800
5 1614 W SUSQUEHANNA AVE 559103 3890499 CD ENFORCE 20 2015-12-22T08:54:23Z HCEU 514795 NH ... NICER HOUSING LLC NON HAZARDOUS 2015-12-21T00:00:00Z VACANT STRUCTURE LICENSE 9-3905 19121-1620 5322 42101015300
9 2044 E SUSQUEHANNA AVE 558701 4020992 CD ENFORCE 37 2016-07-25T13:35:03Z CLIP 546120 NH CMPLY ... COUSINS SUSQUEHANNA LLC NON HAZARDOUS COMPLIED 2016-07-25T00:00:00Z EXT A-VACANT LOT CLEAN/MAINTAI PM-302.2/4 19125-1542 5326 42101016100
10 3149 STIRLING ST 556079 4108090 CD ENFORCE 39 2016-10-07T16:37:36Z CLIP 558849 NH ABATE ... POULIN PAUL M NON HAZARDOUS COMPLIED 2016-10-07T00:00:00Z HIGH WEEDS-CUT CP-312A 19149-3117 5097 42101031600
11 4062 TEESDALE ST 566624 3687656 CD ENFORCE 41 2015-08-24T15:31:15Z CLIP 498908 NH ABATE ... WANG WEIPING ZHU JIE NON HAZARDOUS COMPLIED 2015-08-24T00:00:00Z HIGH WEEDS-CUT CP-312A 19136-3919 5105 42101033000

5 rows × 31 columns

In [36]:
# count the number of each violation in each census tract
vio_count = joined2.groupby(['GEOID', 'violationdescription']).size().unstack(fill_value=0).stack().reset_index(name='violation_counts')
vio_count.head()
Out[36]:
GEOID violationdescription violation_counts
0 42101000100 ANNUAL CERT FIRE ALARM 59
1 42101000100 CLIP VIOLATION NOTICE 6
2 42101000100 CO DETECTOR NEEDED 0
3 42101000100 EXT A-CLEAN RUBBISH/GARBAGE 4
4 42101000100 EXT A-CLEAN WEEDS/PLANTS 0
In [37]:
# merge with the census tract geometries
vio_merged = census16.merge(vio_count, on='GEOID')
vio_merged.head()
Out[37]:
GEOID geometry violationdescription violation_counts
0 42101000100 (POLYGON ((-8364725.428628281 4859476.45937238... ANNUAL CERT FIRE ALARM 59
1 42101000100 (POLYGON ((-8364725.428628281 4859476.45937238... CLIP VIOLATION NOTICE 6
2 42101000100 (POLYGON ((-8364725.428628281 4859476.45937238... CO DETECTOR NEEDED 0
3 42101000100 (POLYGON ((-8364725.428628281 4859476.45937238... EXT A-CLEAN RUBBISH/GARBAGE 4
4 42101000100 (POLYGON ((-8364725.428628281 4859476.45937238... EXT A-CLEAN WEEDS/PLANTS 0
In [38]:
# hvplot
vio_merged.hvplot(c='violation_counts', groupby='violationdescription', crs=3857, width=500, height=430, dynamic=False)
Out[38]:
In [39]:
# compare violations with evictions
evictions.hvplot(c='evictions_count', groupby='year', crs=3857, width=500, height=430, dynamic=False)
Out[39]:

By comparing each of the 20 violation types with the evictions in 2016, it looks like the 3 types that don't seem to be spatially overlapped with evictions are: EXT A-VACANT LOT CLEAN/MAINTAI, LICENSE-RES GENERAL, VIOL C&I MESSAGE.